Overview

Brought to you by YData

Dataset statistics

Number of variables20
Number of observations50000
Missing cells50140
Missing cells (%)5.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory33.7 MiB
Average record size in memory707.3 B

Variable types

Text4
Categorical5
Numeric11

Alerts

Aromaticity is highly overall correlated with Oxidized_coefficient and 1 other fieldsHigh correlation
Function_Prediction_source is highly overall correlated with Protein_sourceHigh correlation
Function_prediction_source is highly overall correlated with Phage_source and 1 other fieldsHigh correlation
Molecular_weight is highly overall correlated with Oxidized_coefficient and 1 other fieldsHigh correlation
Oxidized_coefficient is highly overall correlated with Aromaticity and 2 other fieldsHigh correlation
Phage_source is highly overall correlated with Function_prediction_source and 1 other fieldsHigh correlation
Protein_source is highly overall correlated with Function_Prediction_source and 2 other fieldsHigh correlation
Reduced_coefficient is highly overall correlated with Aromaticity and 2 other fieldsHigh correlation
Start is highly overall correlated with StopHigh correlation
Stop is highly overall correlated with StartHigh correlation
Protein_source is highly imbalanced (93.7%) Imbalance
Function_prediction_source has 22808 (45.6%) missing values Missing
Function_Prediction_source has 27192 (54.4%) missing values Missing
Protein_ID has unique values Unique
Aromaticity has 8208 (16.4%) zeros Zeros
Instability_index has 763 (1.5%) zeros Zeros
Helix_fraction has 2122 (4.2%) zeros Zeros
Turn_fraction has 2961 (5.9%) zeros Zeros
Sheet_fraction has 2307 (4.6%) zeros Zeros
Reduced_coefficient has 13681 (27.4%) zeros Zeros
Oxidized_coefficient has 13180 (26.4%) zeros Zeros

Reproduction

Analysis started2025-07-21 08:30:44.767931
Analysis finished2025-07-21 08:30:58.809486
Duration14.04 seconds
Software versionydata-profiling v0.0.dev0
Download configurationconfig.json

Variables

Distinct47854
Distinct (%)95.7%
Missing0
Missing (%)0.0%
Memory size4.4 MiB
2025-07-21T10:30:58.895442image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length87
Median length84
Mean length34.57618
Min length5

Characters and Unicode

Total characters1728809
Distinct characters66
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique45815 ?
Unique (%)91.6%

Sample

1st rowNC_013650.1
2nd rowNC_021349.1
3rd rowNC_010392.1
4th rowNC_021071.1
5th rowNC_019510.1
ValueCountFrequency (%)
mgv-genome-0379339 5
 
< 0.1%
mgv-genome-0378063 4
 
< 0.1%
kj019095.1 4
 
< 0.1%
mgv-genome-0341507 4
 
< 0.1%
mgv-genome-0378082 4
 
< 0.1%
imgvr_uvig_3300045988_112928|3300045988|ga0495776_101926 4
 
< 0.1%
mgv-genome-0376837 4
 
< 0.1%
imgvr_uvig_3300045988_068024|3300045988|ga0495776_021873 3
 
< 0.1%
uvig_417742 3
 
< 0.1%
imgvr_uvig_2846161197_000008|2846161197|2846161252|18420-67883 3
 
< 0.1%
Other values (47844) 49962
99.9%
2025-07-21T10:30:59.345904image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 189853
 
11.0%
_ 137979
 
8.0%
3 107088
 
6.2%
1 90940
 
5.3%
2 84114
 
4.9%
8 82186
 
4.8%
5 79728
 
4.6%
4 79075
 
4.6%
9 73268
 
4.2%
7 70447
 
4.1%
Other values (56) 734131
42.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1728809
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 189853
 
11.0%
_ 137979
 
8.0%
3 107088
 
6.2%
1 90940
 
5.3%
2 84114
 
4.9%
8 82186
 
4.8%
5 79728
 
4.6%
4 79075
 
4.6%
9 73268
 
4.2%
7 70447
 
4.1%
Other values (56) 734131
42.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1728809
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 189853
 
11.0%
_ 137979
 
8.0%
3 107088
 
6.2%
1 90940
 
5.3%
2 84114
 
4.9%
8 82186
 
4.8%
5 79728
 
4.6%
4 79075
 
4.6%
9 73268
 
4.2%
7 70447
 
4.1%
Other values (56) 734131
42.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1728809
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 189853
 
11.0%
_ 137979
 
8.0%
3 107088
 
6.2%
1 90940
 
5.3%
2 84114
 
4.9%
8 82186
 
4.8%
5 79728
 
4.6%
4 79075
 
4.6%
9 73268
 
4.2%
7 70447
 
4.1%
Other values (56) 734131
42.5%

Protein_source
Categorical

High correlation  Imbalance 

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size3.1 MiB
prodigal
49124 
RefSeq
 
586
Genbank
 
256
DDBJ
 
19
EMBL
 
15

Length

Max length8
Median length8
Mean length7.96872
Min length4

Characters and Unicode

Total characters398436
Distinct characters23
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRefSeq
2nd rowRefSeq
3rd rowRefSeq
4th rowRefSeq
5th rowRefSeq

Common Values

ValueCountFrequency (%)
prodigal 49124
98.2%
RefSeq 586
 
1.2%
Genbank 256
 
0.5%
DDBJ 19
 
< 0.1%
EMBL 15
 
< 0.1%

Length

2025-07-21T10:30:59.444802image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-21T10:30:59.550876image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
prodigal 49124
98.2%
refseq 586
 
1.2%
genbank 256
 
0.5%
ddbj 19
 
< 0.1%
embl 15
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
a 49380
12.4%
r 49124
12.3%
p 49124
12.3%
o 49124
12.3%
d 49124
12.3%
i 49124
12.3%
g 49124
12.3%
l 49124
12.3%
e 1428
 
0.4%
R 586
 
0.1%
Other values (13) 3174
 
0.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 398436
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
a 49380
12.4%
r 49124
12.3%
p 49124
12.3%
o 49124
12.3%
d 49124
12.3%
i 49124
12.3%
g 49124
12.3%
l 49124
12.3%
e 1428
 
0.4%
R 586
 
0.1%
Other values (13) 3174
 
0.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 398436
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
a 49380
12.4%
r 49124
12.3%
p 49124
12.3%
o 49124
12.3%
d 49124
12.3%
i 49124
12.3%
g 49124
12.3%
l 49124
12.3%
e 1428
 
0.4%
R 586
 
0.1%
Other values (13) 3174
 
0.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 398436
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
a 49380
12.4%
r 49124
12.3%
p 49124
12.3%
o 49124
12.3%
d 49124
12.3%
i 49124
12.3%
g 49124
12.3%
l 49124
12.3%
e 1428
 
0.4%
R 586
 
0.1%
Other values (13) 3174
 
0.8%

Function_prediction_source
Categorical

High correlation  Missing 

Distinct7
Distinct (%)< 0.1%
Missing22808
Missing (%)45.6%
Memory size3.0 MiB
eggNOG-mapper
10873 
Iterative search
10077 
-
5366 
RefSeq
 
586
Genbank
 
256
Other values (2)
 
34

Length

Max length16
Median length13
Mean length11.525118
Min length1

Characters and Unicode

Total characters313391
Distinct characters31
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRefSeq
2nd rowRefSeq
3rd rowRefSeq
4th rowRefSeq
5th rowRefSeq

Common Values

ValueCountFrequency (%)
eggNOG-mapper 10873
21.7%
Iterative search 10077
20.2%
- 5366
 
10.7%
RefSeq 586
 
1.2%
Genbank 256
 
0.5%
DDBJ 19
 
< 0.1%
EMBL 15
 
< 0.1%
(Missing) 22808
45.6%

Length

2025-07-21T10:30:59.635184image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-21T10:30:59.709038image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
eggnog-mapper 10873
29.2%
iterative 10077
27.0%
search 10077
27.0%
5366
14.4%
refseq 586
 
1.6%
genbank 256
 
0.7%
ddbj 19
 
0.1%
embl 15
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 53405
17.0%
a 31283
 
10.0%
r 31027
 
9.9%
g 21746
 
6.9%
p 21746
 
6.9%
t 20154
 
6.4%
- 16239
 
5.2%
G 11129
 
3.6%
m 10873
 
3.5%
N 10873
 
3.5%
Other values (21) 84916
27.1%

Most occurring categories

ValueCountFrequency (%)
(unknown) 313391
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 53405
17.0%
a 31283
 
10.0%
r 31027
 
9.9%
g 21746
 
6.9%
p 21746
 
6.9%
t 20154
 
6.4%
- 16239
 
5.2%
G 11129
 
3.6%
m 10873
 
3.5%
N 10873
 
3.5%
Other values (21) 84916
27.1%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 313391
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 53405
17.0%
a 31283
 
10.0%
r 31027
 
9.9%
g 21746
 
6.9%
p 21746
 
6.9%
t 20154
 
6.4%
- 16239
 
5.2%
G 11129
 
3.6%
m 10873
 
3.5%
N 10873
 
3.5%
Other values (21) 84916
27.1%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 313391
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 53405
17.0%
a 31283
 
10.0%
r 31027
 
9.9%
g 21746
 
6.9%
p 21746
 
6.9%
t 20154
 
6.4%
- 16239
 
5.2%
G 11129
 
3.6%
m 10873
 
3.5%
N 10873
 
3.5%
Other values (21) 84916
27.1%

Start
Real number (ℝ)

High correlation 

Distinct34072
Distinct (%)68.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean28967.163
Minimum1
Maximum428743
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-21T10:30:59.805207image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum1
5-th percentile1375.95
Q18914.75
median20912.5
Q337411.5
95-th percentile87802.15
Maximum428743
Range428742
Interquartile range (IQR)28496.75

Descriptive statistics

Standard deviation30857.067
Coefficient of variation (CV)1.065243
Kurtosis14.251476
Mean28967.163
Median Absolute Deviation (MAD)13497.5
Skewness2.882722
Sum1.4483582 × 109
Variance9.521586 × 108
MonotonicityNot monotonic
2025-07-21T10:30:59.892913image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 214
 
0.4%
3 182
 
0.4%
2 168
 
0.3%
50 26
 
0.1%
1041 8
 
< 0.1%
550 8
 
< 0.1%
19703 8
 
< 0.1%
1651 7
 
< 0.1%
14073 7
 
< 0.1%
40 7
 
< 0.1%
Other values (34062) 49365
98.7%
ValueCountFrequency (%)
1 214
0.4%
2 168
0.3%
3 182
0.4%
4 1
 
< 0.1%
5 1
 
< 0.1%
6 3
 
< 0.1%
7 2
 
< 0.1%
8 2
 
< 0.1%
9 1
 
< 0.1%
13 1
 
< 0.1%
ValueCountFrequency (%)
428743 1
< 0.1%
424369 1
< 0.1%
413772 1
< 0.1%
408912 1
< 0.1%
361484 1
< 0.1%
356037 1
< 0.1%
350687 1
< 0.1%
339671 1
< 0.1%
336590 1
< 0.1%
330709 1
< 0.1%

Stop
Real number (ℝ)

High correlation 

Distinct34620
Distinct (%)69.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean29650.514
Minimum63
Maximum428895
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-21T10:30:59.978164image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum63
5-th percentile1978
Q19648
median21621
Q338088.5
95-th percentile88481.25
Maximum428895
Range428832
Interquartile range (IQR)28440.5

Descriptive statistics

Standard deviation30856.762
Coefficient of variation (CV)1.0406822
Kurtosis14.243434
Mean29650.514
Median Absolute Deviation (MAD)13469
Skewness2.8816907
Sum1.4825257 × 109
Variance9.5213978 × 108
MonotonicityNot monotonic
2025-07-21T10:31:00.061191image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
5349 8
 
< 0.1%
8960 7
 
< 0.1%
3317 7
 
< 0.1%
9198 6
 
< 0.1%
11271 6
 
< 0.1%
8329 6
 
< 0.1%
2648 6
 
< 0.1%
1780 6
 
< 0.1%
717 6
 
< 0.1%
6446 6
 
< 0.1%
Other values (34610) 49936
99.9%
ValueCountFrequency (%)
63 1
 
< 0.1%
66 1
 
< 0.1%
67 2
< 0.1%
68 1
 
< 0.1%
71 2
< 0.1%
73 3
< 0.1%
75 2
< 0.1%
76 1
 
< 0.1%
79 2
< 0.1%
80 1
 
< 0.1%
ValueCountFrequency (%)
428895 1
< 0.1%
424716 1
< 0.1%
414050 1
< 0.1%
409217 1
< 0.1%
361909 1
< 0.1%
357215 1
< 0.1%
351340 1
< 0.1%
340450 1
< 0.1%
336715 1
< 0.1%
331221 1
< 0.1%

Strand
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.8 MiB
+
25009 
-
24991 

Length

Max length1
Median length1
Mean length1
Min length1

Characters and Unicode

Total characters50000
Distinct characters2
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row+
2nd row+
3rd row-
4th row-
5th row+

Common Values

ValueCountFrequency (%)
+ 25009
50.0%
- 24991
50.0%

Length

2025-07-21T10:31:00.137691image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-21T10:31:00.197150image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
50000
100.0%

Most occurring characters

ValueCountFrequency (%)
+ 25009
50.0%
- 24991
50.0%

Most occurring categories

ValueCountFrequency (%)
(unknown) 50000
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
+ 25009
50.0%
- 24991
50.0%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 50000
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
+ 25009
50.0%
- 24991
50.0%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 50000
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
+ 25009
50.0%
- 24991
50.0%

Protein_ID
Text

Unique 

Distinct50000
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Memory size4.5 MiB
2025-07-21T10:31:00.295101image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length89
Median length85
Mean length37.44138
Min length8

Characters and Unicode

Total characters1872069
Distinct characters66
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique50000 ?
Unique (%)100.0%

Sample

1st rowYP_003347791.1
2nd rowYP_008061629.1
3rd rowYP_001700595.1
4th rowYP_007877716.1
5th rowYP_007005441.1
ValueCountFrequency (%)
yp_004323762.1 1
 
< 0.1%
biochar_1198_26 1
 
< 0.1%
yp_003347791.1 1
 
< 0.1%
yp_008061629.1 1
 
< 0.1%
yp_001700595.1 1
 
< 0.1%
yp_007877716.1 1
 
< 0.1%
yp_007005441.1 1
 
< 0.1%
yp_007675378.1 1
 
< 0.1%
yp_007006505.1 1
 
< 0.1%
np_899350.1 1
 
< 0.1%
Other values (49990) 49990
> 99.9%
2025-07-21T10:31:00.519971image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
0 195579
 
10.4%
_ 187103
 
10.0%
3 118922
 
6.4%
1 108508
 
5.8%
2 97405
 
5.2%
4 89323
 
4.8%
5 88523
 
4.7%
8 88245
 
4.7%
9 79243
 
4.2%
7 77096
 
4.1%
Other values (56) 742122
39.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1872069
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
0 195579
 
10.4%
_ 187103
 
10.0%
3 118922
 
6.4%
1 108508
 
5.8%
2 97405
 
5.2%
4 89323
 
4.8%
5 88523
 
4.7%
8 88245
 
4.7%
9 79243
 
4.2%
7 77096
 
4.1%
Other values (56) 742122
39.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1872069
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
0 195579
 
10.4%
_ 187103
 
10.0%
3 118922
 
6.4%
1 108508
 
5.8%
2 97405
 
5.2%
4 89323
 
4.8%
5 88523
 
4.7%
8 88245
 
4.7%
9 79243
 
4.2%
7 77096
 
4.1%
Other values (56) 742122
39.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1872069
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
0 195579
 
10.4%
_ 187103
 
10.0%
3 118922
 
6.4%
1 108508
 
5.8%
2 97405
 
5.2%
4 89323
 
4.8%
5 88523
 
4.7%
8 88245
 
4.7%
9 79243
 
4.2%
7 77096
 
4.1%
Other values (56) 742122
39.6%
Distinct3990
Distinct (%)8.0%
Missing0
Missing (%)0.0%
Memory size4.0 MiB
2025-07-21T10:31:00.639590image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length902
Median length761
Mean length26.1706
Min length2

Characters and Unicode

Total characters1308530
Distinct characters77
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1764 ?
Unique (%)3.5%

Sample

1st rowhypothetical protein
2nd rowHNH endonuclease
3rd rowbacteriophage tail tip assembly protein%3B Lambda gpK homolog
4th rowhypothetical protein
5th rowDNA primase/helicase
ValueCountFrequency (%)
unknown 19675
 
11.6%
protein 12785
 
7.6%
of 4714
 
2.8%
hypothetical 4343
 
2.6%
the 4086
 
2.4%
domain 3814
 
2.3%
phage 3180
 
1.9%
family 2963
 
1.8%
dna 2889
 
1.7%
to 2099
 
1.2%
Other values (5219) 108485
64.2%
2025-07-21T10:31:00.859535image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
n 128827
 
9.8%
119048
 
9.1%
e 101206
 
7.7%
o 99962
 
7.6%
i 88219
 
6.7%
t 82679
 
6.3%
a 75942
 
5.8%
r 57663
 
4.4%
s 49739
 
3.8%
l 47978
 
3.7%
Other values (67) 457267
34.9%

Most occurring categories

ValueCountFrequency (%)
(unknown) 1308530
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
n 128827
 
9.8%
119048
 
9.1%
e 101206
 
7.7%
o 99962
 
7.6%
i 88219
 
6.7%
t 82679
 
6.3%
a 75942
 
5.8%
r 57663
 
4.4%
s 49739
 
3.8%
l 47978
 
3.7%
Other values (67) 457267
34.9%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 1308530
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
n 128827
 
9.8%
119048
 
9.1%
e 101206
 
7.7%
o 99962
 
7.6%
i 88219
 
6.7%
t 82679
 
6.3%
a 75942
 
5.8%
r 57663
 
4.4%
s 49739
 
3.8%
l 47978
 
3.7%
Other values (67) 457267
34.9%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 1308530
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
n 128827
 
9.8%
119048
 
9.1%
e 101206
 
7.7%
o 99962
 
7.6%
i 88219
 
6.7%
t 82679
 
6.3%
a 75942
 
5.8%
r 57663
 
4.4%
s 49739
 
3.8%
l 47978
 
3.7%
Other values (67) 457267
34.9%
Distinct64
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size3.2 MiB
2025-07-21T10:31:00.948999image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Length

Max length45
Median length9
Mean length10.45642
Min length6

Characters and Unicode

Total characters522821
Distinct characters25
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7 ?
Unique (%)< 0.1%

Sample

1st rowhypothetical;
2nd rowpackaging;
3rd rowassembly;infection;
4th rowhypothetical;
5th rowreplication;
ValueCountFrequency (%)
unsorted 27439
54.9%
hypothetical 4341
 
8.7%
assembly 3639
 
7.3%
replication 2465
 
4.9%
infection 1989
 
4.0%
packaging 1729
 
3.5%
assembly;infection 1495
 
3.0%
lysis 1438
 
2.9%
integration 1228
 
2.5%
regulation 1117
 
2.2%
Other values (54) 3120
 
6.2%
2025-07-21T10:31:01.350824image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
; 53934
10.3%
e 50194
9.6%
t 49640
9.5%
n 47280
9.0%
o 43312
 
8.3%
s 42415
 
8.1%
r 35409
 
6.8%
u 30652
 
5.9%
i 29958
 
5.7%
d 27623
 
5.3%
Other values (15) 112404
21.5%

Most occurring categories

ValueCountFrequency (%)
(unknown) 522821
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
; 53934
10.3%
e 50194
9.6%
t 49640
9.5%
n 47280
9.0%
o 43312
 
8.3%
s 42415
 
8.1%
r 35409
 
6.8%
u 30652
 
5.9%
i 29958
 
5.7%
d 27623
 
5.3%
Other values (15) 112404
21.5%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 522821
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
; 53934
10.3%
e 50194
9.6%
t 49640
9.5%
n 47280
9.0%
o 43312
 
8.3%
s 42415
 
8.1%
r 35409
 
6.8%
u 30652
 
5.9%
i 29958
 
5.7%
d 27623
 
5.3%
Other values (15) 112404
21.5%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 522821
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
; 53934
10.3%
e 50194
9.6%
t 49640
9.5%
n 47280
9.0%
o 43312
 
8.3%
s 42415
 
8.1%
r 35409
 
6.8%
u 30652
 
5.9%
i 29958
 
5.7%
d 27623
 
5.3%
Other values (15) 112404
21.5%

Molecular_weight
Real number (ℝ)

High correlation 

Distinct44528
Distinct (%)89.2%
Missing70
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean4139.025
Minimum75.0666
Maximum8930.7077
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-21T10:31:01.439191image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum75.0666
5-th percentile446.43031
Q12030.3248
median4195.8407
Q36243.8347
95-th percentile7691.756
Maximum8930.7077
Range8855.6411
Interquartile range (IQR)4213.5098

Descriptive statistics

Standard deviation2369.4824
Coefficient of variation (CV)0.57247357
Kurtosis-1.2461612
Mean4139.025
Median Absolute Deviation (MAD)2102.2304
Skewness-0.042482482
Sum2.0666152 × 108
Variance5614446.9
MonotonicityNot monotonic
2025-07-21T10:31:01.525083image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
131.1729 110
 
0.2%
146.1876 98
 
0.2%
147.1293 73
 
0.1%
174.201 58
 
0.1%
105.0926 55
 
0.1%
89.0932 54
 
0.1%
117.1463 49
 
0.1%
75.0666 47
 
0.1%
146.1445 39
 
0.1%
133.1027 39
 
0.1%
Other values (44518) 49308
98.6%
(Missing) 70
 
0.1%
ValueCountFrequency (%)
75.0666 47
0.1%
89.0932 54
0.1%
105.0926 55
0.1%
115.1305 18
 
< 0.1%
117.1463 49
0.1%
119.1192 14
 
< 0.1%
121.1582 6
 
< 0.1%
131.1729 110
0.2%
132.1179 39
 
0.1%
133.1027 39
 
0.1%
ValueCountFrequency (%)
8930.7077 1
< 0.1%
8815.9815 1
< 0.1%
8811.4767 1
< 0.1%
8770.8033 1
< 0.1%
8768.9863 1
< 0.1%
8743.0033 1
< 0.1%
8730.1896 1
< 0.1%
8728.032 1
< 0.1%
8718.777 1
< 0.1%
8712.844 1
< 0.1%

Aromaticity
Real number (ℝ)

High correlation  Zeros 

Distinct470
Distinct (%)0.9%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.089570148
Minimum0
Maximum1
Zeros8208
Zeros (%)16.4%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-21T10:31:01.612931image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.042553191
median0.083333333
Q30.125
95-th percentile0.2
Maximum1
Range1
Interquartile range (IQR)0.082446809

Descriptive statistics

Standard deviation0.077318557
Coefficient of variation (CV)0.86321792
Kurtosis28.320356
Mean0.089570148
Median Absolute Deviation (MAD)0.041666667
Skewness3.2710967
Sum4478.5074
Variance0.0059781592
MonotonicityNot monotonic
2025-07-21T10:31:01.696131image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 8208
 
16.4%
0.1428571429 1000
 
2.0%
0.1 980
 
2.0%
0.1111111111 971
 
1.9%
0.125 934
 
1.9%
0.09090909091 928
 
1.9%
0.1666666667 810
 
1.6%
0.07692307692 809
 
1.6%
0.08333333333 789
 
1.6%
0.07142857143 755
 
1.5%
Other values (460) 33816
67.6%
ValueCountFrequency (%)
0 8208
16.4%
0.01428571429 14
 
< 0.1%
0.01449275362 32
 
0.1%
0.01470588235 21
 
< 0.1%
0.01492537313 20
 
< 0.1%
0.01515151515 18
 
< 0.1%
0.01538461538 36
 
0.1%
0.015625 17
 
< 0.1%
0.01587301587 22
 
< 0.1%
0.01612903226 17
 
< 0.1%
ValueCountFrequency (%)
1 64
0.1%
0.75 2
 
< 0.1%
0.6666666667 23
 
< 0.1%
0.6 8
 
< 0.1%
0.5454545455 1
 
< 0.1%
0.5 157
0.3%
0.4666666667 2
 
< 0.1%
0.4615384615 1
 
< 0.1%
0.4545454545 3
 
< 0.1%
0.4444444444 8
 
< 0.1%

Instability_index
Real number (ℝ)

Zeros 

Distinct39246
Distinct (%)78.6%
Missing70
Missing (%)0.1%
Infinite0
Infinite (%)0.0%
Mean35.801315
Minimum-93.533333
Maximum388.53333
Zeros763
Zeros (%)1.5%
Negative3351
Negative (%)6.7%
Memory size390.8 KiB
2025-07-21T10:31:01.787125image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum-93.533333
5-th percentile-4.2333333
Q117.619216
median33.687388
Q350.628983
95-th percentile83.915991
Maximum388.53333
Range482.06667
Interquartile range (IQR)33.009767

Descriptive statistics

Standard deviation29.322439
Coefficient of variation (CV)0.81903246
Kurtosis5.7274053
Mean35.801315
Median Absolute Deviation (MAD)16.518227
Skewness1.1393033
Sum1787559.7
Variance859.80545
MonotonicityNot monotonic
2025-07-21T10:31:01.881633image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 763
 
1.5%
5 434
 
0.9%
6.666666667 330
 
0.7%
7.5 232
 
0.5%
8 123
 
0.2%
-8.98 96
 
0.2%
-13.725 95
 
0.2%
-21.63333333 85
 
0.2%
55.65 84
 
0.2%
-37.45 82
 
0.2%
Other values (39236) 47606
95.2%
ValueCountFrequency (%)
-93.53333333 2
 
< 0.1%
-79.55 2
 
< 0.1%
-78 1
 
< 0.1%
-74.83333333 1
 
< 0.1%
-72.525 2
 
< 0.1%
-71.73333333 4
 
< 0.1%
-70.15 18
< 0.1%
-69.1 2
 
< 0.1%
-68.56666667 5
 
< 0.1%
-67.65 1
 
< 0.1%
ValueCountFrequency (%)
388.5333333 1
 
< 0.1%
291.4 13
< 0.1%
275.64 1
 
< 0.1%
269.8 1
 
< 0.1%
261.8 3
 
< 0.1%
260.1333333 1
 
< 0.1%
258.3 1
 
< 0.1%
258.05 2
 
< 0.1%
248.6444444 1
 
< 0.1%
245.5333333 1
 
< 0.1%

Isoelectric_point
Real number (ℝ)

Distinct19985
Distinct (%)40.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean6.816935
Minimum4.0500284
Maximum11.999968
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-21T10:31:01.970968image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum4.0500284
5-th percentile4.0500284
Q14.6183046
median6.0690062
Q39.1618761
95-th percentile10.6676
Maximum11.999968
Range7.9499393
Interquartile range (IQR)4.5435715

Descriptive statistics

Standard deviation2.3676714
Coefficient of variation (CV)0.34732199
Kurtosis-1.2709006
Mean6.816935
Median Absolute Deviation (MAD)1.9125183
Skewness0.41583882
Sum340846.75
Variance5.6058679
MonotonicityNot monotonic
2025-07-21T10:31:02.064573image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
4.050028419 4580
 
9.2%
5.525000191 762
 
1.5%
11.99996777 531
 
1.1%
8.750052071 429
 
0.9%
9.750021172 251
 
0.5%
5.240009499 157
 
0.3%
5.524318123 147
 
0.3%
11.00083675 141
 
0.3%
5.57001667 141
 
0.3%
5.494989204 138
 
0.3%
Other values (19975) 42723
85.4%
ValueCountFrequency (%)
4.050028419 4580
9.2%
4.051619911 1
 
< 0.1%
4.052074623 1
 
< 0.1%
4.052131462 1
 
< 0.1%
4.052586174 3
 
< 0.1%
4.052699852 3
 
< 0.1%
4.053779793 1
 
< 0.1%
4.053836632 1
 
< 0.1%
4.05395031 2
 
< 0.1%
4.054007149 2
 
< 0.1%
ValueCountFrequency (%)
11.99996777 531
1.1%
11.94478283 1
 
< 0.1%
11.93324299 1
 
< 0.1%
11.92660275 1
 
< 0.1%
11.91719036 1
 
< 0.1%
11.91712589 1
 
< 0.1%
11.91706142 2
 
< 0.1%
11.91196842 1
 
< 0.1%
11.91022778 2
 
< 0.1%
11.90784245 1
 
< 0.1%

Helix_fraction
Real number (ℝ)

Zeros 

Distinct1156
Distinct (%)2.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.29489187
Minimum0
Maximum1
Zeros2122
Zeros (%)4.2%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-21T10:31:02.158140image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.083333333
Q10.23636364
median0.2962963
Q30.35185185
95-th percentile0.48484848
Maximum1
Range1
Interquartile range (IQR)0.11548822

Descriptive statistics

Standard deviation0.12461176
Coefficient of variation (CV)0.42256765
Kurtosis6.2809129
Mean0.29489187
Median Absolute Deviation (MAD)0.057490326
Skewness0.92946035
Sum14744.593
Variance0.015528091
MonotonicityNot monotonic
2025-07-21T10:31:02.251711image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.3333333333 2615
 
5.2%
0 2122
 
4.2%
0.25 1619
 
3.2%
0.2857142857 1057
 
2.1%
0.5 1006
 
2.0%
0.2 914
 
1.8%
0.3 744
 
1.5%
0.4 724
 
1.4%
0.2727272727 623
 
1.2%
0.375 573
 
1.1%
Other values (1146) 38003
76.0%
ValueCountFrequency (%)
0 2122
4.2%
0.01818181818 1
 
< 0.1%
0.02 1
 
< 0.1%
0.02173913043 1
 
< 0.1%
0.02222222222 1
 
< 0.1%
0.02325581395 1
 
< 0.1%
0.0243902439 1
 
< 0.1%
0.02777777778 2
 
< 0.1%
0.02941176471 5
 
< 0.1%
0.0303030303 2
 
< 0.1%
ValueCountFrequency (%)
1 301
0.6%
0.875 2
 
< 0.1%
0.8571428571 3
 
< 0.1%
0.8571428571 4
 
< 0.1%
0.8333333333 2
 
< 0.1%
0.8333333333 2
 
< 0.1%
0.8181818182 1
 
< 0.1%
0.8 8
 
< 0.1%
0.8 1
 
< 0.1%
0.7857142857 1
 
< 0.1%

Turn_fraction
Real number (ℝ)

Zeros 

Distinct884
Distinct (%)1.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2061528
Minimum0
Maximum1
Zeros2961
Zeros (%)5.9%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-21T10:31:02.340657image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10.14285714
median0.2
Q30.25531915
95-th percentile0.37931034
Maximum1
Range1
Interquartile range (IQR)0.11246201

Descriptive statistics

Standard deviation0.11343311
Coefficient of variation (CV)0.55023804
Kurtosis9.3726165
Mean0.2061528
Median Absolute Deviation (MAD)0.056410256
Skewness1.7154598
Sum10307.64
Variance0.012867071
MonotonicityNot monotonic
2025-07-21T10:31:02.430656image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2961
 
5.9%
0.25 1657
 
3.3%
0.2 1603
 
3.2%
0.1666666667 1399
 
2.8%
0.3333333333 1173
 
2.3%
0.1428571429 1045
 
2.1%
0.2222222222 728
 
1.5%
0.1818181818 728
 
1.5%
0.125 664
 
1.3%
0.2857142857 647
 
1.3%
Other values (874) 37395
74.8%
ValueCountFrequency (%)
0 2961
5.9%
0.01754385965 1
 
< 0.1%
0.01886792453 1
 
< 0.1%
0.01960784314 1
 
< 0.1%
0.02 1
 
< 0.1%
0.02040816327 2
 
< 0.1%
0.02127659574 4
 
< 0.1%
0.02173913043 1
 
< 0.1%
0.02222222222 1
 
< 0.1%
0.02380952381 1
 
< 0.1%
ValueCountFrequency (%)
1 180
0.4%
0.8888888889 1
 
< 0.1%
0.88 1
 
< 0.1%
0.8666666667 2
 
< 0.1%
0.8571428571 1
 
< 0.1%
0.8571428571 1
 
< 0.1%
0.8333333333 2
 
< 0.1%
0.8275862069 1
 
< 0.1%
0.8 8
 
< 0.1%
0.75 32
 
0.1%

Sheet_fraction
Real number (ℝ)

Zeros 

Distinct1005
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.25831688
Minimum0
Maximum1
Zeros2307
Zeros (%)4.6%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-21T10:31:02.518829image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.055555556
Q10.1875
median0.25
Q30.32
95-th percentile0.45454545
Maximum1
Range1
Interquartile range (IQR)0.1325

Descriptive statistics

Standard deviation0.12661106
Coefficient of variation (CV)0.49013854
Kurtosis6.2826014
Mean0.25831688
Median Absolute Deviation (MAD)0.065789474
Skewness1.2605728
Sum12915.844
Variance0.01603036
MonotonicityNot monotonic
2025-07-21T10:31:02.837103image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2307
 
4.6%
0.25 1847
 
3.7%
0.3333333333 1803
 
3.6%
0.2 1274
 
2.5%
0.2857142857 906
 
1.8%
0.1666666667 895
 
1.8%
0.5 856
 
1.7%
0.2222222222 672
 
1.3%
0.1428571429 649
 
1.3%
0.2727272727 588
 
1.2%
Other values (995) 38203
76.4%
ValueCountFrequency (%)
0 2307
4.6%
0.01886792453 1
 
< 0.1%
0.01960784314 1
 
< 0.1%
0.02040816327 1
 
< 0.1%
0.025 1
 
< 0.1%
0.02702702703 1
 
< 0.1%
0.02941176471 1
 
< 0.1%
0.0303030303 3
 
< 0.1%
0.03225806452 2
 
< 0.1%
0.03333333333 3
 
< 0.1%
ValueCountFrequency (%)
1 272
0.5%
0.8888888889 1
 
< 0.1%
0.8823529412 1
 
< 0.1%
0.875 2
 
< 0.1%
0.8333333333 5
 
< 0.1%
0.8 16
 
< 0.1%
0.7777777778 1
 
< 0.1%
0.7777777778 1
 
< 0.1%
0.7692307692 1
 
< 0.1%
0.75 51
 
0.1%

Reduced_coefficient
Real number (ℝ)

High correlation  Zeros 

Distinct78
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4977.758
Minimum0
Maximum49500
Zeros13681
Zeros (%)27.4%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-21T10:31:02.921999image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median2980
Q37450
95-th percentile15470
Maximum49500
Range49500
Interquartile range (IQR)7450

Descriptive statistics

Standard deviation5519.6204
Coefficient of variation (CV)1.1088567
Kurtosis2.8873211
Mean4977.758
Median Absolute Deviation (MAD)2980
Skewness1.5171578
Sum2.488879 × 108
Variance30466209
MonotonicityNot monotonic
2025-07-21T10:31:03.010778image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 13681
27.4%
1490 8144
16.3%
2980 5037
 
10.1%
6990 3129
 
6.3%
5500 2871
 
5.7%
4470 2818
 
5.6%
8480 2476
 
5.0%
9970 1624
 
3.2%
5960 1557
 
3.1%
12490 1073
 
2.1%
Other values (68) 7590
15.2%
ValueCountFrequency (%)
0 13681
27.4%
1490 8144
16.3%
2980 5037
 
10.1%
4470 2818
 
5.6%
5500 2871
 
5.7%
5960 1557
 
3.1%
6990 3129
 
6.3%
7450 708
 
1.4%
8480 2476
 
5.0%
8940 279
 
0.6%
ValueCountFrequency (%)
49500 1
 
< 0.1%
46980 1
 
< 0.1%
45490 3
< 0.1%
44000 1
 
< 0.1%
41940 1
 
< 0.1%
41480 1
 
< 0.1%
40450 2
< 0.1%
39990 3
< 0.1%
38960 3
< 0.1%
38500 1
 
< 0.1%

Oxidized_coefficient
Real number (ℝ)

High correlation  Zeros 

Distinct217
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4996.358
Minimum0
Maximum49500
Zeros13180
Zeros (%)26.4%
Negative0
Negative (%)0.0%
Memory size390.8 KiB
2025-07-21T10:31:03.095863image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median2980
Q37450
95-th percentile15720
Maximum49500
Range49500
Interquartile range (IQR)7450

Descriptive statistics

Standard deviation5529.528
Coefficient of variation (CV)1.1067117
Kurtosis2.8697698
Mean4996.358
Median Absolute Deviation (MAD)2980
Skewness1.5137876
Sum2.498179 × 108
Variance30575680
MonotonicityNot monotonic
2025-07-21T10:31:03.192765image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 13180
26.4%
1490 7491
15.0%
2980 4413
 
8.8%
6990 2645
 
5.3%
5500 2581
 
5.2%
4470 2382
 
4.8%
8480 2071
 
4.1%
9970 1324
 
2.6%
5960 1280
 
2.6%
12490 914
 
1.8%
Other values (207) 11719
23.4%
ValueCountFrequency (%)
0 13180
26.4%
125 416
 
0.8%
250 75
 
0.1%
375 8
 
< 0.1%
500 2
 
< 0.1%
1490 7491
15.0%
1615 537
 
1.1%
1740 93
 
0.2%
1865 17
 
< 0.1%
1990 5
 
< 0.1%
ValueCountFrequency (%)
49500 1
 
< 0.1%
46980 1
 
< 0.1%
45490 3
< 0.1%
44000 1
 
< 0.1%
41940 1
 
< 0.1%
41480 1
 
< 0.1%
40450 2
< 0.1%
40115 1
 
< 0.1%
39990 2
< 0.1%
38960 3
< 0.1%

Phage_source
Categorical

High correlation 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size2.9 MiB
IMG_VR
13979 
MGV
12188 
GPD
8874 
GOV2
6028 
TemPhD
4021 
Other values (9)
4910 

Length

Max length8
Median length7
Mean length4.349
Min length3

Characters and Unicode

Total characters217450
Distinct characters29
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowRefSeq
2nd rowRefSeq
3rd rowRefSeq
4th rowRefSeq
5th rowRefSeq

Common Values

ValueCountFrequency (%)
IMG_VR 13979
28.0%
MGV 12188
24.4%
GPD 8874
17.7%
GOV2 6028
12.1%
TemPhD 4021
 
8.0%
CHVD 2273
 
4.5%
GVD 840
 
1.7%
RefSeq 586
 
1.2%
PhagesDB 393
 
0.8%
IGVD 368
 
0.7%
Other values (4) 450
 
0.9%

Length

2025-07-21T10:31:03.283385image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
img_vr 13979
28.0%
mgv 12188
24.4%
gpd 8874
17.7%
gov2 6028
12.1%
temphd 4021
 
8.0%
chvd 2273
 
4.5%
gvd 840
 
1.7%
refseq 586
 
1.2%
phagesdb 393
 
0.8%
igvd 368
 
0.7%
Other values (4) 450
 
0.9%

Most occurring characters

ValueCountFrequency (%)
G 42533
19.6%
V 35836
16.5%
M 26182
12.0%
D 16807
 
7.7%
R 14565
 
6.7%
I 14347
 
6.6%
_ 13979
 
6.4%
P 13288
 
6.1%
O 6028
 
2.8%
2 6028
 
2.8%
Other values (19) 27857
12.8%

Most occurring categories

ValueCountFrequency (%)
(unknown) 217450
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
G 42533
19.6%
V 35836
16.5%
M 26182
12.0%
D 16807
 
7.7%
R 14565
 
6.7%
I 14347
 
6.6%
_ 13979
 
6.4%
P 13288
 
6.1%
O 6028
 
2.8%
2 6028
 
2.8%
Other values (19) 27857
12.8%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 217450
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
G 42533
19.6%
V 35836
16.5%
M 26182
12.0%
D 16807
 
7.7%
R 14565
 
6.7%
I 14347
 
6.6%
_ 13979
 
6.4%
P 13288
 
6.1%
O 6028
 
2.8%
2 6028
 
2.8%
Other values (19) 27857
12.8%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 217450
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
G 42533
19.6%
V 35836
16.5%
M 26182
12.0%
D 16807
 
7.7%
R 14565
 
6.7%
I 14347
 
6.6%
_ 13979
 
6.4%
P 13288
 
6.1%
O 6028
 
2.8%
2 6028
 
2.8%
Other values (19) 27857
12.8%

Function_Prediction_source
Categorical

High correlation  Missing 

Distinct3
Distinct (%)< 0.1%
Missing27192
Missing (%)54.4%
Memory size2.8 MiB
-
12530 
eggNOG-mapper
8666 
Iterative search
1612 

Length

Max length16
Median length1
Mean length6.6196072
Min length1

Characters and Unicode

Total characters150980
Distinct characters18
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st roweggNOG-mapper
2nd roweggNOG-mapper
3rd roweggNOG-mapper
4th roweggNOG-mapper
5th roweggNOG-mapper

Common Values

ValueCountFrequency (%)
- 12530
25.1%
eggNOG-mapper 8666
 
17.3%
Iterative search 1612
 
3.2%
(Missing) 27192
54.4%

Length

2025-07-21T10:31:03.363143image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2025-07-21T10:31:03.425344image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
ValueCountFrequency (%)
12530
51.3%
eggnog-mapper 8666
35.5%
iterative 1612
 
6.6%
search 1612
 
6.6%

Most occurring characters

ValueCountFrequency (%)
e 22168
14.7%
- 21196
14.0%
g 17332
11.5%
p 17332
11.5%
a 11890
7.9%
r 11890
7.9%
G 8666
 
5.7%
O 8666
 
5.7%
N 8666
 
5.7%
m 8666
 
5.7%
Other values (8) 14508
9.6%

Most occurring categories

ValueCountFrequency (%)
(unknown) 150980
100.0%

Most frequent character per category

(unknown)
ValueCountFrequency (%)
e 22168
14.7%
- 21196
14.0%
g 17332
11.5%
p 17332
11.5%
a 11890
7.9%
r 11890
7.9%
G 8666
 
5.7%
O 8666
 
5.7%
N 8666
 
5.7%
m 8666
 
5.7%
Other values (8) 14508
9.6%

Most occurring scripts

ValueCountFrequency (%)
(unknown) 150980
100.0%

Most frequent character per script

(unknown)
ValueCountFrequency (%)
e 22168
14.7%
- 21196
14.0%
g 17332
11.5%
p 17332
11.5%
a 11890
7.9%
r 11890
7.9%
G 8666
 
5.7%
O 8666
 
5.7%
N 8666
 
5.7%
m 8666
 
5.7%
Other values (8) 14508
9.6%

Most occurring blocks

ValueCountFrequency (%)
(unknown) 150980
100.0%

Most frequent character per block

(unknown)
ValueCountFrequency (%)
e 22168
14.7%
- 21196
14.0%
g 17332
11.5%
p 17332
11.5%
a 11890
7.9%
r 11890
7.9%
G 8666
 
5.7%
O 8666
 
5.7%
N 8666
 
5.7%
m 8666
 
5.7%
Other values (8) 14508
9.6%

Interactions

2025-07-21T10:30:57.428105image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:48.697460image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:49.440477image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:50.180110image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:51.082083image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:51.830650image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:52.600291image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:53.589464image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:54.344360image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:55.112244image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:56.653565image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:57.497529image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:48.768858image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:49.501044image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:50.245932image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:51.148295image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:51.894231image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:52.670167image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:53.656468image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:54.412814image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:55.179238image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:56.718901image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:57.563231image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:48.832213image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:49.565475image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:50.309747image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:51.216539image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:51.958054image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:52.743973image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:53.724396image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:54.480391image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:55.865371image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:56.785930image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:57.633194image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:48.897677image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:49.628418image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:50.375561image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:51.279517image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:52.022828image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:52.814443image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:53.789082image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:54.547776image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:55.932074image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:56.852192image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:57.704554image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:48.966704image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:49.695894image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:50.442404image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:51.349729image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:52.099916image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:52.889137image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:53.854708image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:54.612720image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:55.999776image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:56.921771image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:57.784072image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:49.035981image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:49.762823image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:50.508009image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:51.415614image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:52.168495image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:52.958631image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:53.922412image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:54.684218image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:56.065914image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:56.993603image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:57.858627image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:49.106729image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:49.834654image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:50.581772image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:51.491859image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:52.243083image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:53.031750image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:53.996130image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:54.760472image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:56.139957image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:57.072601image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:57.928683image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:49.172822image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:49.901972image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:50.652574image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:51.558434image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:52.326901image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:53.102695image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:54.060382image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:54.830003image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:56.206930image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:57.143140image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:57.995841image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:49.237993image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:49.971471image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:50.879925image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:51.625918image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:52.396796image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:53.372316image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:54.131047image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:54.902514image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:56.453671image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:57.210874image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:58.062698image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:49.305475image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:50.036224image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:50.948684image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:51.692132image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:52.464053image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:53.445566image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:54.196008image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:54.970824image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:56.512131image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:57.276202image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:58.131973image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:49.371606image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:50.110579image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:51.014986image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:51.761720image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:52.533543image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:53.517130image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:54.266544image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:55.042220image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:56.582147image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
2025-07-21T10:30:57.346606image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/

Correlations

2025-07-21T10:31:03.479437image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
AromaticityFunction_Prediction_sourceFunction_prediction_sourceHelix_fractionInstability_indexIsoelectric_pointMolecular_weightOxidized_coefficientPhage_sourceProtein_sourceReduced_coefficientSheet_fractionStartStopStrandTurn_fraction
Aromaticity1.0000.0000.0160.454-0.015-0.0090.2060.5960.0180.0000.600-0.2370.0380.0380.005-0.035
Function_Prediction_source0.0001.0000.0000.0360.0150.0350.0390.0060.3241.0000.0050.0500.0670.0660.0000.063
Function_prediction_source0.0160.0001.0000.0410.0000.0190.0230.0000.8231.0000.0000.0080.0800.0790.0530.030
Helix_fraction0.4540.0360.0411.000-0.142-0.0590.0630.2380.0190.0110.243-0.0650.0490.0480.012-0.195
Instability_index-0.0150.0150.000-0.1421.000-0.0290.1690.0850.0090.0000.0810.146-0.003-0.0040.0040.021
Isoelectric_point-0.0090.0350.019-0.059-0.0291.0000.0560.0310.0220.0000.032-0.276-0.015-0.0150.0000.001
Molecular_weight0.2060.0390.0230.0630.1690.0561.0000.6270.0110.0010.6200.0370.001-0.0010.0000.019
Oxidized_coefficient0.5960.0060.0000.2380.0850.0310.6271.0000.0060.0000.998-0.1160.0140.0120.0000.008
Phage_source0.0180.3240.8230.0190.0090.0220.0110.0061.0001.0000.0060.0220.0690.0690.0540.020
Protein_source0.0001.0001.0000.0110.0000.0000.0010.0001.0001.0000.0000.0000.0580.0580.0400.007
Reduced_coefficient0.6000.0050.0000.2430.0810.0320.6200.9980.0060.0001.000-0.1130.0140.0120.0000.007
Sheet_fraction-0.2370.0500.008-0.0650.146-0.2760.037-0.1160.0220.000-0.1131.000-0.022-0.0250.000-0.325
Start0.0380.0670.0800.049-0.003-0.0150.0010.0140.0690.0580.014-0.0221.0000.9990.000-0.020
Stop0.0380.0660.0790.048-0.004-0.015-0.0010.0120.0690.0580.012-0.0250.9991.0000.000-0.015
Strand0.0050.0000.0530.0120.0040.0000.0000.0000.0540.0400.0000.0000.0000.0001.0000.006
Turn_fraction-0.0350.0630.030-0.1950.0210.0010.0190.0080.0200.0070.007-0.325-0.020-0.0150.0061.000

Missing values

2025-07-21T10:30:58.272542image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
A simple visualization of nullity by column.
2025-07-21T10:30:58.494571image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2025-07-21T10:30:58.693186image/svg+xmlMatplotlib v3.9.4, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

Phage_IDProtein_sourceFunction_prediction_sourceStartStopStrandProtein_IDProductProtein_classificationMolecular_weightAromaticityInstability_indexIsoelectric_pointHelix_fractionTurn_fractionSheet_fractionReduced_coefficientOxidized_coefficientPhage_sourceFunction_Prediction_source
0NC_013650.1RefSeqRefSeq6494865748+YP_003347791.1hypothetical proteinhypothetical;6435.97650.10714335.9892865.8687060.2500000.1785710.1785711146011460RefSeqNaN
1NC_021349.1RefSeqRefSeq8165181983+YP_008061629.1HNH endonucleasepackaging;4363.89860.07500061.0700008.0706200.2750000.3000000.1750001100011125RefSeqNaN
2NC_010392.1RefSeqRefSeq1285013449-YP_001700595.1bacteriophage tail tip assembly protein%3B Lambda gpK homologassembly;infection;6944.73670.11864439.3559329.4161390.2372880.1525420.2372881948019605RefSeqNaN
3NC_021071.1RefSeqRefSeq106486106767-YP_007877716.1hypothetical proteinhypothetical;2866.24800.173913111.0391305.2746240.4347830.0869570.26087044704470RefSeqNaN
4NC_019510.1RefSeqRefSeq1047812196+YP_007005441.1DNA primase/helicasereplication;1412.37060.16666738.1916674.0500280.1666670.1666670.16666755005500RefSeqNaN
5NC_020857.1RefSeqRefSeq2488925374+YP_007675378.1hypothetical proteinhypothetical;2274.43830.04761918.6952385.6638010.1904760.1428570.04761900RefSeqNaN
6NC_019519.1RefSeqRefSeq3063731104+YP_007006505.1hypothetical proteinhypothetical;1923.24240.133333174.69333310.8343790.3333330.1333330.33333355005500RefSeqNaN
7NC_005083.2RefSeqRefSeq6137261992+NP_899350.1hypothetical proteinhypothetical;7769.52340.15151547.7015154.8462860.3484850.2121210.2424242246022585RefSeqNaN
8NC_004589.1RefSeqRefSeq50225225+NP_795671.1hypothetical proteinhypothetical;7866.68320.13432852.9597014.0937940.4029850.1940300.3432841593015930RefSeqNaN
9NC_011811.1RefSeqRefSeq1280412938+YP_002456045.1hypothetical proteinhypothetical;4856.88110.06818239.1045459.1829570.4318180.1363640.36363600RefSeqNaN
Phage_IDProtein_sourceFunction_prediction_sourceStartStopStrandProtein_IDProductProtein_classificationMolecular_weightAromaticityInstability_indexIsoelectric_pointHelix_fractionTurn_fractionSheet_fractionReduced_coefficientOxidized_coefficientPhage_sourceFunction_Prediction_source
49990biochar_2976prodigalNaN52885470+biochar_2976_10unknownunsorted;6655.63890.05000058.2933334.0500280.3500000.2333330.41666769906990STV-
49991biochar_953prodigalNaN3510935303+biochar_953_49unknownunsorted;7437.21900.09375044.6078134.4257910.2343750.2187500.2500002200022125STV-
49992biochar_1080prodigalNaN2388324353+biochar_1080_38virion structural proteinassembly;infection;1650.73640.00000072.9500004.0500280.3125000.3125000.25000000STVIterative search
49993biochar_2262prodigalNaN53737160+biochar_2262_4hypothetical proteinhypothetical;3901.32470.08571461.5114294.8603820.2571430.3142860.25714314901490STVeggNOG-mapper
49994biochar_4542prodigalNaN988310386-biochar_4542_21unknownunsorted;2939.40310.11111122.5370374.0500280.4074070.2962960.29629655005500STV-
49995biochar_1665prodigalNaN2301823911+biochar_1665_32glycosyl transferase family 8immune;2185.48720.11764760.8647069.8210010.2941180.0588240.05882414901490STVIterative search
49996biochar_1081prodigalNaN4991506-biochar_1081_2Part of the outer membrane protein assembly complex, which is involved in assembly and insertion of beta-barrel proteins into the outer membraneintegration;assembly;5763.42260.09090937.0527274.2452700.2545450.3090910.25454584808480STVeggNOG-mapper
49997biochar_2772prodigalNaN1371013892-biochar_2772_19unknownunsorted;6793.70020.06666771.56333310.5092650.1666670.2833330.11666769907115STV-
49998biochar_5776prodigalNaN66807210-biochar_5776_16unknownunsorted;3992.54180.11111172.46666711.0014810.2222220.2500000.25000069906990STV-
49999biochar_1198prodigalNaN1556016285+biochar_1198_26unknownunsorted;3391.87350.03225874.7322585.9675490.1612900.2903230.38709755005500STV-